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Abstract — Bilevel optimization problems are a class of chal- 
lenging optimization problems, which contain two levels of 
optimization tasks. In these problems, the optimal solutions to the 
lower level problem become possible feasible candidates to the 
upper level problem. Such a requirement makes the optimization 
problem difficult to solve, and has kept the researchers busy 
towards devising methodologies, which can efficiently handle the 
problem. Despite the efforts, there hardly exists any effective 
methodology, which is capable of handling a complex bilevel 
problem. In this paper, we introduce bilevel evolutionary algo- 
rithm based on quadratic approximations (BLEAQ) of optimal 
lower level variables with respect to the upper level variables. 
The approach is capable of handling bilevel problems with 
different kinds of complexities in relatively smaller number of 
function evaluations. Ideas from classical optimization have been 
hybridized with evolutionary methods to generate an efficient 
optimization algorithm for generic bilevel problems. The efficacy 
of the algorithm has been shown on two sets of test problems. 
The first set is a recently proposed SMD test set, which contains 
problems with controllable complexities, and the second set 
contains standard test problems collected from the literature. The 
proposed method has been evaluated against two benchmarks, 
and the performance gain is observed to be significant. 

Index Terms — Bilevel optimization, Evolutionary algorithms, 
Quadratic approximations. 



I. Introduction 

Bilevel optimization is a branch of optimization, which 
contains a nested optimization problem within the constraints 
of the outer optimization problem. The outer optimization task 
is usually referred as the upper level task, and the nested inner 
optimization task is referred as the lower level task. The lower 
level problem appears as a constraint, such that only an optimal 
solution to the lower level optimization problem is a possible 
feasible candidate to the upper level optimization problem. 
Such a requirement makes bilevel optimization problems dif- 
ficult to handle and have kept researchers and practitioners 
busy alike. In the field of classical optimization, a number 
of studies have been conducted on bilevel programming |j4|, 
ED, m. Approximate solution techniques are commonly 
employed to handle bilevel problems with simplifying as- 
sumptions like smoothness, linearity or convexity. Some of 
the classical approaches commonly used to handle bilevel 
problems include the Karush-Kuhn-Tucker approach O, ifTSll . 
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Branch-and-bound techniques |l2l| and the use of penalty func- 
tions 111- Most of these solution methodologies are rendered 
inapplicable, as soon as the bilevel optimization problem 
becomes complex. Heuristic procedures such as evolutionary 
algorithms have also been developed for handling bilevel 
problems with higher levels of complexity ||281 . ||26| . Most of 
the existing evolutionary procedures often involve enormous 
computational expense, which limit their utility to solving 
bilevel optimization problems only with smaller number of 
variables. 

There are a number of practical problems which are 
bilevel in nature. They are often encountered in transportation 
(network design, optimal pricing), economics (Stackelberg 
games, principal-agent problem, taxation, policy decisions), 
management (network facility location, coordination of multi- 
divisional firms), engineering (optimal design, optimal chem- 
ical equilibria) etc I?]. Complex practical problems are usu- 
ally modified into a simpler single level optimization task, 
which is solved to arrive at a satisfycing solution instead 
of an optimal solution. For the complex bilevel problems, 
classical methods fail because of real world difficulties (like 
non-linearity, discreetness, non-differentiability, non-convexity 
etc), and evolutionary methods are not very useful because of 
their enormous computational expense, a hybrid strategy could 
be solution. In cognizance of the drawbacks associated with 
the two approaches, we propose a hybrid strategy in this paper, 
which utilizes principles from classical optimization, within an 
evolutionary algorithm to quickly approach a bilevel optimum. 
The proposed method is a bilevel evolutionary algorithm based 
on quadratic approximations (BLEAQ) of the lower level 
optimal variables as a function of upper level variables. 

To begin with, we provide a review of the past work on 
bilevel optimization using evolutionary algorithms, followed 
by description of a general bilevel optimization problem. 
Thereafter, we provide a supporting evidence that a strategy 
based on iterative quadratic approximations of the lower level 
optimal variables with respect to the upper level variables 
could be used to converge towards the bilevel optimal solution. 
This is followed by the description of the methodology, which 
utilizes the proposed quadratic approximation principle within 
the evolutionary algorithm. The proposed ideas are further 
supported by results on a number of test problems. Firstly, 
BLEAQ is evaluated on recently proposed unconstrained SMD 
test problems |l23|, where the method is shown to successfully 
handle test problems with 10 dimensions. A performance com- 
parison is performed against a nested evolutionary strategy, 
and the efficiency gain is provided. Secondly, BLEAQ is 
evaluated on a set of standard test problems chosen from the 



2 



literature. These are constrained test problems with smaller 
number of variables. For comparing the results obtained using 
BLEAQ, we choose the algorithm proposed in ||27l . which is 
able to successfully solve all the test problems 

II. Past research on Bilevel Optimization using 
Evolutionary Algorithms 

Evolutionary algorithms for bilevel optimization have been 
proposed as early as in the 1990s. One of the first evolutionary 
algorithm for handling bilevel optimization problems was 
proposed by Mathieu et al. lfTSl . The proposed algorithm was 
a nested strategy, where the lower level was handled using a 
linear programming method, and the upper level was solved 
using a genetic algorithm (GA). Nested strategies could be 
one of the approaches to handle bilevel problems, where 
for every upper level vector a lower level optimization task 
is executed. However, a nested approach is computationally 
expensive and is not a feasible approach for large scale 
bilevel problems. Another nested approach was proposed in 
1281 . where the lower level was handled using the Frank- 
Wolfe algorithm (reduced gradient method). The algorithm 
was successful in handling non-convex bilevel optimization 
problems, and the authors claimed it to be better than the 
classical methods. In 2005, Oduguwa and Rov lfT9l proposed 
a co-evolutionary approach for finding optimal solution for 
bilevel optimization problems. Their approach utilizes two 
populations. The first population handles upper level vectors, 
and the second population handles lower level vectors. The 
two populations interact with each other to converge towards 
the optimal solution. An extension of this study can be found 
in nn . where the authors solve a bilevel application problem 
with linear objectives and constraints. 

Wang et al. 1271 proposed an evolutionary algorithm based 
on a constraint handling scheme, where they successfully solve 
a number of standard test problems. The results produced 
by their approach finds better solutions for a number of test 
problems, as compared to what is reported in the literature. The 
algorithm is able to handle non-differentiability at the upper 
level objective function. However, the method may not be able 
to handle non-differentiability in the constraints, or the lower 
level objective function. Given the robustness of the approach 
in handling a variety of standard test problems, we choose this 
algorithm as one of the benchmarks. Another evolutionary al- 
gorithm proposed in ifTT] utilizes particle swarm optimization 
to handle the bilevel problems. Even this approach is nested as 
it solves the lower level optimization problem for each upper 
level vector The authors show that the approach is able to han- 
dle a number of standard test problems with smaller number 
or variables. However, they do not report the computational 
expense of the nested procedure. A hybrid approach proposed 
in ITSl . which is also nested, utilizes simplex-based crossover 
strategy at the upper level, and solves the lower level using one 
of the classical approaches. This method successfully solves a 
number of standard test problems, however given the nested 
nature of the algorithm it is not scalable for large number 
of variables. The authors report the number of generations 
and population sizes required by the algorithm, which may be 



used to compute the function evaluations at the upper level, 
but they do not explicitly report the total number of function 
evaluations required at the lower level. 

Researchers in the field of evolutionary algorithms have 
also tried to convert the bilevel optimization problem into 
a single level optimization problem using the Karush-Kuhn- 
Tucker (KKT) conditions (26l, [H, [l6l. However, such 
conversions are possible only for those bilevel problems, 
where the lower level is smooth and the KKT conditions can 
be easily produced. 

Recently, there has also been interest in multi-objective 
bilevel optimization using evolutionary algorithms. Some of 
the studies in the direction of solving multi-objective bilevel 
optimization problems using evolutionary algorithms are lfT2l . 
m, 161, 122, Eol, i29l. 

III. Single-Objective Bilevel Problem 

Bilevel optimization is a nested optimization problem which 
involves two levels of optimization tasks. The structure of 
a bilevel optimization problem demands that the optimal 
solutions to the lower level optimization problem may only 
be considered as feasible candidates for the upper level 
optimization problem. The problem contains two classes of 
variables, the upper level variables a;„ € Xjj C K", and the 
lower level variables xi S Xl C M"*. For the lower level 
problem, the optimization task is performed with respect to 
variables xi, and the variables x„ act as parameters. A different 
Xu leads to a different lower level optimization problem, 
whose optimal solution needs to be determined. The upper 
level problem usually involves all variables x = and 
the optimization is expected to be performed with respect to 
both the sets of variables. In the following we provide two 
equivalent definitions of a bilevel optimization problem: 

Definition 1: For the upper-level objective function F : M" x 
M'" E and lower-level objective function / : M" xM'" M 

minimize Fq (a;„ , xi ) 

subject to xi e argmin{/o(x„,x/) : fj{xu,xi) < 0, 

j = l,...,J} 
Fk{xu,xi) <0,k=l,...,K 

The above definition can be stated in terms of set-valued 
mappings as follows: 

Definition 2: Let ^I* : M" =^ M™ be a set-valued mapping, 

argmin{/o(a::u,a;;) : fj{xu,xi) < 0,j = 1,..., J}, 

which represents the constraint defined by the lower-level 
optimization problem, i.e. '^'{xu) C Xl for every x„ G Xjj. 
Then the bi-level optimization problem can be expressed as a 
general constrained optimization problem: 

minimize Fq{xutXi) 

subject to xi £ 

Fk{xu,xi) < 0, fc = 1,...,A' 

where ^' can be interpreted as a parameterized range-constraint 
for the lower-level decision xi. 
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Fig. 2. Graphical representation of a simple bilevel optimization problem. 



The graph of the optimal-decision mapping is interpreted as 
a subset of Xu x Xl 

gph^- = {{xu,xi) ^ Xu Xl \ Xl ^ ^(a^n)}, 

which displays the connections between the upper-level de- 
cisions and corresponding optimal lower-level decisions. The 
domain of ^I*, which is obtained as a projection of gph ^ on 
the upper-level decision space Xu represents all the points 
Xu e Xjj, where the lower-level (follower's) problem has at 
least one optimal solution, i.e. 

dom^- = {Xu\^{Xu) 0}. 

Similarly, the range is given by 

rgc* = {xi\xi G for some .t„}, 

which corresponds to the projection of gph^ on the lower- 
level decision space Xl- 

Figure[T]describes the structure of a bilevel problem in terms 
of two components: (i) gph^, which gives the graph of the 
optimal decision-mapping ^' as a subset of Xjj x Xl', and (ii) 
the plot of Fo evaluated on gph ^, which shows the upper level 
objective function with respect to upper level variables 
when the lower level is optimal xi G The shaded area 

of gph 4' shows the regions where there are multiple lower 
level optimal vectors corresponding to any upper level vector. 
On the other hand, the non-shaded parts of the graph represent 
the regions where "S/ is a single-valued mapping, i.e. there is a 
single lower level optimal vector corresponding to any upper 
level vector Considering gpli^i as the domain of Fq in the 
figure, we can interpret Fq entirely as a function of Xu, i-e. 
Fo(a;u, ^'(.Tii)). Therefore, whenever ^ is multi-valued, we 
can also see a shaded region in the plot of Fq which shows the 
different upper level function values for any upper level vector 
with multiple lower level optimal solutions. For instance, in 
Figure [T] the shaded region of gph 4' corresponds to the shaded 
region of Fq for the upper level vectors between x\ and x^. 

For a more detailed illustration, see the 3 -dimensional graph 
in Figure |2] where the values of the upper and lower level 
objective functions Fq and /o are plotted against the decision 
space Xij xXl- For simplicity, let us again assume that gph 
is the domain of Fq. If we now consider the values of Fq 
plotted against the plane of decision variables, we obtain the 



same information as before. However, as an addition to the 
previous figure, we have also described how the lower level 
objective function /g depends on the upper level decisions. 
In the figure, the shaded planes marked as A, B and C 
represent three different lower level optimization problems 
parameterized by Xu \ and xl^\ respectively. Once an 
upper-level decision vector has been fixed, the lower level 
objective can be interpreted entirely as a function of xi. Hence 
each shaded plane shows a single-variable plot of /o against 
Xl given a fixed Consider, for example, the plane A 
corresponding to the upper level decision Xu K From the shape 
of the lower level objective function it is easy to see that there 
are multiple optimal solutions at the lower level. Therefore, 
also ^ must be set-valued at this point, and the collection 
of optimal lower level solutions is given by "^{xu ^). For the 
other two shaded planes B and C, there is only a single lower 
level optimal solution for the given Xu, which corresponds to 
being single- valued at these points. The optimal upper level 
solution is indicated by point (.t*,.t*), where ^"(0;*) = {x^}. 

IV. Localization of the Follower's Problem 

Ideally, the analysis of a bilevel problem would be greatly 
simplified if the optimal solution mapping could be treated 
as if it were an ordinary function. In particular, for the design 
of an efficient bilevel algorithm, it would be valuable to 
identify the circumstances under which single-valued functions 
can be used to construct local approximations for the optimal 
solution mapping. Given that our study of solution mappings 
is closely related to sensitivity analysis in parametric opti- 
mization, there already exist considerable research on the reg- 
ularity properties of solution mappings, and especially on the 
conditions for obtaining single-valued localizations of general 
set-valued mappings. To formalize the notions of localization 
and single-valuedness in the context of set-valued mappings, 
we have adopted the following definition by Dontchev and 
Rockafellar lITOl : 

Definition 3 (Localization and single-valuedness): For a 
given set-valued mapping : Xjj =^ Xl and a pair 
{xu,xi) G gph 5*, a graphical localization of at Xu for 
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minimum for the follower's problem is given by the standard 
variational inequality 



Fig. 3. Localization around x{f^ for i^;"^- 



xi is a set-valued mapping ^I^ioc such that 

gph^-ioc = ([/ X L) ngph* 

for some upper-level neighborhood U of Xu and lower-level 
neighborhood L of xi, i.e. 



*ioc(a;„) 



n i for Xu e U, 
otherwise. 



If ^loc is actually a function with domain J7, it is indicated 
by referring to a single-valued localization ipioc ■ U — > Xl 
around Xu for xi. 

Obviously, graphical localizations of the above type can 
be defined for any optimal-decision mapping. However, it is 
more interesting to ask when the localization is not only a 
single-valued function but also possesses convenient properties 
such as continuity and certain degree of smoothness. In this 
section, our plan is to study how the lower-level solution 
mappings behave under small perturbations, and clarify the 
regularity conditions in which the solution mapping can be 
locally characterized by a Lipschitz-continuous single-valued 
function. We begin the discussion from simple problems with 
convex set-constraints in section IIV-AI and gradually extend 
the results to problems with general non-linear constraints in 
section IIV-BI 

As an example. Figure [3] shows the ^' mapping and its 
localization t/'ioc around x^u^ for xj*'' . The notations discussed 
in the previous definition are shown in the figure to develop 
a graphical insight for the theory discussed above. 

A. Localization with Convex Constraints 

To motivate the development, suppose that the follower's 
problem is of form 



minimize /o(a;„, a;;) over all xi £ C, 



(1) 



where lower-level decisions are restricted by a simple set- 
constraint C which is assumed to be non-empty, convex, and 
closed in X^. 

When the lower-level objective function /q is continuously 
differentiable, the necessary condition for xi to be a local 



Vih{x,„xi)+Nc{xi)3{), 



(2) 



where 



Nc{xi) = {v\{v, x'l - xi) < 0, for all x'l G C} (3) 

is the normal cone to C at xi and V;/o denotes the gradient 
of /o with respect to the lower-level decision vector The 
solutions to the inequality are referred to as stationary points 
with respect to minimizing over C, regardless of whether or 
not they correspond to local or global minimum. Of course, in 
the special case when /o is also convex, the inequality yields a 
sufficient condition for xi to be a global minimum, but in other 
cases the condition is not sufficient to guarantee lower-level 
optimality of xi. Rather the inequality is better interpreted as 
a tool for identifying quasi-solutions, which can be augmented 
with additional criteria to ensure optimality. In particular, as 
discussed by Dontchev and Rockafellar IfTOl , significant efforts 
have been done to develop tools that help to understand the 
behavior of solutions under small perturbations which we are 
planning to utilize while proposing our solution for solving 
the bilevel problem. With this purpose in mind, we introduce 
the following definition of a quasi-solution mapping for the 
follower's problem: 

Definition 4 (Quasi-solution mapping): The solution candi- 
dates (stationary points) to the follower's problem of form ([U 
can be identified by set- valued mapping vp* : Xjj ^ Xl, 

**(.T„) = {xi\\/ifo{xu,xi)+Nc{xi) 3 0}, 

which represents the set of stationary lower-level decisions 
for the given upper-level decision Xu- When /o is convex, VP* 
coincides to the optimal solution mapping, i.e. = ^I*. 

Whereas direct localization of ^ is difficult, it is easier 
to obtain a well-behaved localization for the quasi-solution 
mapping first, and then establish the conditions under which 
the obtained solutions furnish a lower-level local minimum. 
The approach is motivated by the fact that for variational 
inequalities of the above type, there are several variants of 
implicit function theorem that can be readily applied to obtain 
localizations with desired properties. Below, we present two 
localization-theorems for quasi-solution mappings. The first 
theorem shows the circumstances in which there exists a the 
single-valued localization of 4'* such that it is Lipschitz- 
continuous around a given pair (x* , x^) G gph 4'*. The second 
theorem elaborates the result further under the additional as- 
sumption that C is polyhedral, which is sufficient to guarantee 
that for all points in the neighborhood of x* there is a strong 
local minimum in the follower's problem ([T). 

Definition 5 (Lipschitz): A single-valued localization ijjioc ■ 
Xjj Xl is said to be Lipschitz continuous around an upper- 
level decision x^ when there exists a neighborhood U of Xu 
and a constant > such that 

\i^\oc{xu) - V'ioc(a;^)| < l^\xu ~ for all x„,x„ G U. 

Theorem 6 (Localization of quasi-solution mapping): Sup- 
pose in the lower-level optimization problem ([U, with x^ G 
**(x*), that 
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(i) C is non-empty, convex, and closed in Xl, and 

(ii) /o is twice continuously differentiable with respect to xi, 
and has a strong convexity property at (x-*, x-^), i.e. there 
exists /i > such that 

{'^iik{xu,x*i )w,w) > for all w e C - C. 

Then the quasi-solution mapping 'f* has a Lipschitz continu- 
ous single-valued localization ip around x* for x'^. 

Proof. When the lower-level objective function /o is 
twice continuously differentiable with the inequality 
{'^ufo{XuiX*i)w,'w) > nlwl"^ holding for all w S C - C, 
then by Proposition 2G.4 and Exercise 2G.5 in ifTOl the 
assumptions (a) and (b) in Theorem 2G.2 are satisfied, and 
this gives the rest. □ 

Corollary 7 (Localization with polyhedral constraints): 
Suppose that in the setup of Theorem (|6]l C is polyhedral. 
Then the additional conclusion holds that, for all Xu in some 
neighborhood of .t*, there is a strong local minimum at 

xi = ijj{xu), i.e. for some e > 

foixu,x'i) > foixu,xi) + ^\x'i-xi\'^ for all G C near xi. 

Proof. See Exercise 2G.5 and Theorem 2G.3 in ITOl. □ 

It is worthwhile to note that second order conditions to 
ensure that a quasi-solution corresponds to an actual local 
minimum exist also for other than polyhedral sets, but the 
conditions are generally not available in a convenient form. 

B. Localization with Nonlinear Constraints 

Until now, we have considered the simple class of follower's 
problems where the constraint set is not allowed to depend on 
the upper-level decisions. In practice, however, the follower's 
constraints are often dictated by the leader's choices. There- 
fore, the above discussion needs to be extended to cover the 
case of general non-linear constraints that are allowed to be 
functions of both xi and x^- Fortunately, this can be done by 
drawing upon the results available for constrained parametric 
optimization problems. 

Consider the follower's problem of form 

minimize fa{xu,xi) 

subject to fj{xu, xi) < 0, j = 1, . . . , J 

where the functions fo, fi, ■ ■ ■ , fj are assumed to be twice 
continuously differentiable. Let L{xu, xi,y) — fo{xu,xi) + 
yifi{xu,xi) + ■■■ + yjfj{xu,xi) denote the Lagrangian 
function. Then the necessary first-order optimality condition 
is given by the following variational inequality 

f{xu,xi,y) +NE{xuy) 3 (0,0), 

where 

f{xu, xi,y) ^ {ViL{xu, xi,y), -VyL{xu, xi,y)), 

The pairs {xi , y) which solve the variational inequality are 
called the Karush-Kuhn-Tucker pairs corresponding to the 



upper-level decision Xu- Now in the similar fashion as done 
in Section IIV-AI our interest is to establish the conditions for 
the existence of a mapping tAioc which can be used to capture 
the behavior of the xi component of the Karush-Kuhn-Tucker 
pairs as a function of the upper-level decisions x„. 

Theorem 8 (Localization with Nonlinear Constraints): For 
a given upper-level decision a;*, let {x^,y*) denote a corre- 
sponding Karush-Kuhn-Tucker pair Suppose that the above 
problem for twice continuously differentiable functions fi is 
such that the following regularity conditions hold 

(i) the gradients Vifi{xu, xi), i € I are linearly independent, 
and 

(ii) {w, Vf;L(x* , x^, y*)'w) > for every w ^ 0, w e 
where 

/ = [1,J] I /.(.<,.T*) = 0}, 

AI+ = {w e R" I w _L Vz/,(a;*,.T*) for all i e I}. 
Then the Karush-Kuhn-Tucker mapping S : Xjj ^ X^ x M."^ , 
S{xu) ■■= {{xi,y) I f{xu,xi,y) + NE{xi,y) B (0,0)}, 

has a Lipschitz-continuous single-valued localization s around 

X* for {x^^y*), s{xu) = {tlJ\oc{xu),y{xu)), where ■i/'ioc : 
Xjj — > Xl and y : Xjj — > M"'. Moreover, for every Xu in 
some neighborhood of x* the lower-level decision component 
xi = il^\oc{xu) gives a strong local minimum. 

Proof. See e.g. Theorem 1 in |j9i or Theorem 2G.9 in ifTOl . 

□ 

Despite the more complicated conditions required to es- 
tablish the result for general non-linear constraints case, the 
eventual outcome of the Theorem dU is essentially similar to 
Theorems © and (|7]i. That is, in the vicinity of the current 
optimal solution, we can express the locally optimal follower's 
decisions as a relatively smooth function of the upper-level 
decisions. 

C. Approximation with Localizations 

The single-valued localizations of solution mappings can be 
used as effective tools for alleviating computational burden 
in a greedy evolutionary algorithm. Instead of solving the 
follower's problem from scratch for every upper-level decision, 
we can use localizations to obtain an "educated guess" on the 
new optimal lower-level decision. The intuition for using the 
above results is as follows. 

Suppose that (a;„, Xi) G gpli ^E* is a known lower-level opti- 
mal pair, i.e. xi G ^{xu), and the lower-level problem satisfies 
the regularity conditions in the previous theorems. Then for 
some open neighborhoods U C Xjj of Xu and L C X^ of 
xi, there exists a uniquely determined /x-Lipschitz continuous 
function ip\oc ■ U — > Xl such that .xj ~ il'iodx'^) is the unique 
local optimal solution of the follower's problem in L for each 
x'^ G U. The existence of such V-'ioc leads to a direct strategy 
for generating new solution candidates from the existing ones. 
If the currently known upper level decision Xu is perturbed by 
a small random vector e such that Xu+e £ U, then the newly 
generated point (a;„ + e,ip\oc{xu + e)) gives another locally 
optimal solution where the change in the lower-level optimal 
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Fig. 4. Approximation with localization around 



decision vector is bounded by \ipioc{xu + s) ~ xi\ < ^\e\. In 
theory, this would allow us to reduce the bilevel optimization 
problem to that of minimizing a locally Lipschitz function 
Fq{xu,iI.'\oc{xu))\ see e.g. Dempe et al. for discussion on 
implicit function based techniques. 

However, an essential difficulty for applying the result in 
practice follows from the fact that the mapping i/'ioc is not 
known with certainty, except for a few special cases. To resolve 
this problem in an efficient way, we consider embedding 
the above results within an evolutionary framework, where 
estimates of t/jioc are produced by using samples of currently 
known optimal points. For simplicity, suppose that we want 
to find an estimate of i(j\oc around current best solution 
{xu,xi) £ gph\I>, and let 



) e XuxXl I xf' e ez} c gph* 



be a sample from the neighborhood of (a;„, xi). Then the task 
of finding a good estimator for ip\oc can be viewed as an 
ordinary supervised learning problem: 

Definition 9 (Learning of V'locj-" Let H be the hypothesis 
space, i.e. the set of functions that can be used to predict 
the optimal lower-level decision from the given upper-level 
decision. Given the sample V, our goal is to choose a predictor 
ijj £ "H such that it minimizes the empirical error on the sample 
data-set, i.e. 



ijj = argmin Vl(/i(xW),x|'^), 



hen 



(4) 



where L : Xl x Xl K denotes the empirical risk function. 

For a graphical illustration, see Figure El showing one 
example of a local approximation ijj around xi"' for xi using 
a quadratic function. When the exact form of the underlying 
mapping is unknown, the approximation generally leads to 
an error as one moves away from the point around which 
the localization is performed. In the figure, the approximation 
error is shown for a point xi^' in the neighborhood of xi"'. 
This error may not be significant in the vicinity of x[^\ and 
could provide a good guess for the lower-level optimal solution 
for a new x„ close to xi"'. 

In this paper, we have chosen to use the squared prediction 
error as the empirical risk function when performing the 



approximations, i.e. 

L(Mxi^x«) = |xp'-Mxi'')P, 

and at the same time we have restricted the hypothesis space 
H to consist of second-order polynomials. With these choices 
the empirical risk minimization problem (|4) corresponds to an 
ordinary quadratic regression problem. Therefore, as long as 
the estimation problem is kept light enough and the evolu- 
tionary framework is such that the solution population can be 
used to construct a training dataset V, the use of the estimation 
approach can be expected to enhance the algorithm's overall 
performance by reducing the number of times the lower-level 
optimization problem needs to solved. 

V. Algorithm description 

In this section, we provide a description for the bilevel 
evolutionary algorithm based on quadratic approximations 
(BLEAQ). The optimization strategy is based on approxima- 
tion of the lower level optimal variables as a function of upper 
level variables. To begin with, an initial population of upper 
level members with random upper level variables is initialized. 
For each member, the lower level optimization problem is 
solved using a lower level optimization scheme, and optimal 
lower level members are noted. Based on the lower level 
optimal solutions achieved, a quadratic relationship between 
the upper level variables and each lower level optimal variable 
is established. If the approximation is good (in terms of mean 
squared error) then it can be used to predict the optimal lower 
level variables for any given set of upper level variables. This 
eliminates the requirement to solve a lower level optimization 
problem. However, one needs to be cautious while accepting 
the optimal solutions from the quadratic approximations as 
even a single poor solution might lead to a wrong bilevel 
optimum. At each generation of the algorithm, a new quadratic 
approximation is generated which goes on improving as the 
population converges towards the true optima. At the termi- 
nation of the procedure, the algorithm not only provides the 
optimal solutions to a bilevel problem, but also acceptably 
accurate functions representing the relationship between upper 
and lower variables at the optima. These functions can be 
useful for strategy developments or making decisions in bilevel 
problems appearing in fields like game theory, economics, 
science or engineering. In the following we provide a step- 
by-step procedure for the algorithm. 



S. 1 



S. 2 



S. 3 



Initialization: The algorithm starts with a random 
population of size N, which is initialized by gen- 
erating the required number of upper level variables, 
and then executing the lower level evolutionary opti- 
mization procedure to determine the corresponding 
optimal lower level variables. Fitness is assigned 
based on upper level function value and constraints. 
Tagging: Tag all the upper level members as 1 which 
have undergone a lower level optimization run, and 
others as 0. 

Selection of upper level parents: Given the current 
population, we randomly choose 2(/_t — 1) number 
of members from the population and perform a 
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tournament selection based on upper level function 
value. This produces /i — 1 number of parents. 

S. 4 Evolution at the upper level: Choose the best tag 
1 member as the index parent, and the members 
from the previous step as other parents. Then, create 
A number of offsprings from the chosen /i parents, 
using crossover and polynomial mutation operators. 

S. 5 Quadratic approximation: If the number of tag 
1 members in the population is greater than 
(dim(x„)+iKd»m(x„)+2) ^ j^en select all 

the tag 1 upper level members to construct quadratic 
functions to represent each of the lower level op- 
timal variables as a function of upper level vari- 
ables. If the number of tag 1 members is less than 

(dim{x^)+i){dim{x^,)+2) _^ (lim{x,j) then a quadratic 

approximation is not performed. 

S. 6 Lower level optimum: If a quadratic approxima- 
tion was performed in the previous step then find 
the lower level optima for the offsprings using the 
quadratic approximation. If the mean squared error 
Cmse is less than eo(le-3), then the quadratic ap- 
proximation is considered good and the offsprings 
are tagged as 1, otherwise they are tagged as 0. If 
a quadratic approximation was not performed in the 
previous step then execute lower level optimization 
runs for each of the offsprings. To execute lower level 
optimization for an offspring member, the closest 
tag 1 parent is determined. From the closest tag 
1 parent, the lower level optimal member {x] ') is 
copied (Refer Section IV- Al l. Thereafter, a lower level 
optimization run is performed which first tries to 
optimize the problem using a quadratic programming 
approach, and if unsuccessful, it uses an evolution- 
ary optimization algorithm. The copied lower level 
member from the closest upper level parent is used 
in the lower level optimization run. Tag the offspring 
as 1 for which lower level optimization is performed. 

S. 7 Population Update: After finding the lower level vari- 
ables for the offsprings, r members are chosen from 
the parent population. A pool of chosen r members 
and A offsprings is formed. The best r members from 
the pool replace the chosen r members from the 
population. A termination check is performed and 
the algorithm moves to the next generation (Step 3) 
if the termination check is false. 



A. Property of two close upper level members 

For two close upper level members, it is often expected that 
the lower level optimal solutions will also lie close to each 
other. This scenario is explained in Figure |5] where xL^' and 

(2) 

Xu are close to each other, and therefore their corresponding 
optimal lower level members are also close. On the other hand 
x'^P is far away from x'^^ and xi^'*. Therefore, its optimal 
lower level member is not necessarily close to the other two 
lower level members. This property of the bilevel problems 
is utilized in the proposed algorithm, such that if a lower 
level optimization has been performed for one of the upper 
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Fig. 5. Lower level optimal solutions for three different upper level members. 



level members, then the corresponding lower level member is 
utilized while performing a lower level optimization task for 
another upper level member in the vicinity of the previous 
member 



B. Lower Level Optimization 

At the lower level we first use a quadratic programming 
approach to find the optima. If the procedure is unsuccessful 
we use a global optimization procedure using an evolutionary 
algorithm to find the optimum. The lower level optimization 
using evolutionary algorithm is able to handle complex opti- 
mization problems with multimodality. The fitness assignment 
at this level is performed based on lower level function value 
and constraints. Following are the steps for the lower level 
optimization procedure: 

1 ) Lower level quadratic programming: 

S. 1 Create (rf»'"(^0+iKd»ru(x,)+2) ^ dtm{xi) lower level 

(c) 

points about x) using polynomial mutation. 

(c) 

S. 2 Construct a quadratic function about x^ using the 
created points. Construct linear approximations for 
the constraints. 

S. 3 Optimize the quadratic function with linear con- 
straints using a sequentially quadratic programming 
approach. 

S. 4 Compute the value of the optimum using the 
quadratic function and the lower level function. If the 
absolute difference is less than (5mm then accept the 
solution as lower level optimum, otherwise perform 
an evolutionary optimization search. 

2) Lower level evolutionary optimization: 

S. 1 If lower level evolutionary optimization is executed 
directly without quadratic programming, then ran- 
domly initialize n lower level members. If quadratic 
programming is executed, then use the solution ob- 
tained using quadratic programming as one of the 
population members and randomly initialize other 
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n — 1 lower level members. The upper level variables 
are kept fixed for all the population members. 

S. 2 Choose 2^ members randomly from the population, 
and perform a tournament selection to choose ji 
parents for crossover 

S. 3 The best parent among ji parents is chosen as the in- 
dex parent and A number of offsprings are produced 
using the crossover and mutation operators. 

S. 4 A population update is performed by choosing r 
random members from the population. A pool is 
formed using r chosen members and A offsprings, 
from which the best r members are used to replace 
the r chosen members from the population. 

S. 5 Next generation (Step 2) is executed if the termina- 
tion criteria is not satisfied. 

C. Constraint handling 

As we know by now that a bilevel problem has two levels 
of optimization tasks. There may be constraints at the lower 
level and also constraints at the upper level. We modify any 
given bilevel problems such that lower level constraints belong 
only to the lower level optimization task, however, at the upper 
level we include both the upper and lower level constraints. 
This is done to ensure at the upper level that a solution which 
is not feasible at the lower level cannot be feasible at the upper 
level, no matter whether the lower level optimization task 
is performed or not. While computing the overall constraint 
violation for a solution at the upper level, it is not taken into 
account whether the lower level variables are optimal or not. 
We use a separate tagging scheme in the algorithm to account 
for optimality or non-optimality of lower level variables. 

The algorithm uses similar constraint handling scheme at 
both the levels, where the overall constraint violation for any 
solution is the summation of the violations of all the equality 
and inequality constraints. A solution a;'-*' is said to 'constraint 
dominate' 15] a solution .t^-') if any of the following conditions 
are true: 

1) Solution x'*-* is feasible and solution x'-') is not. 

2) Solution x^*-* and a;'^-* are both infeasible but solution 
x^*^ has a smaller overall constraint violation. 

3) Solution x^*-* and x^-*-* are both feasible but the objective 
value of x^*^ is less than that of 

D. Crossover Operator 

The crossover operator used in Step 2 is similar to the PCX 
operator proposed in |f24l|- The operator creates a new solution 
from 3 parents as follows: 

c = x''"^ + u^d + uj„i- (5) 

The terms used in the above equation are defined as follows: 

• x^^-* is the index parent 

• d = x^P-* — g, where g is the mean of /i parents 

• p^^'> and p^^'' are the other two parents 

• ojj = 0.1 and = ll"{i,)^g^^^ ^6 the two parameters. 
The two parameters uj^ and w^, describe the extent of vari- 
ations along the respective directions. At the upper level, a 



crossover is performed only with the upper level variables 
and the lower level variables are determined from the quadratic 
function or by lower level optimization call. At the lower level, 
crossover is performed only with the lower level variables and 
the upper level variables are kept fixed as parameters. 

E. Termination Criteria 

The algorithm uses a variance based termination criteria at 
both the levels. At the upper level, when the value of a„ 
described in the following equation becomes less than ct^"^' 
the algorithm terminates. 

From the above equation, a„ is always greater than and 
should be less than 1 most of the times. AI is the number 
of upper level variables in the bilevel optimization problem, 
.xjj^ : i G {1, 2, . . . , A/} represents the upper level variables in 
generation number T, and x\^^^ :iG{l,2,...,Af} represents 
the upper level variables in the initial random population. The 
value of should be kept small to ensure a high accuracy. 

A similar termination scheme is used for the lower level 
evolutionary algorithm. The value for ai is determined as 
follows. 

In the above equation, m is the number of variables at the 
lower level, Xj : i G {1, 2, . . . , m] represents the lower level 
variables in generation number t, and x|^^ : i G {1, 2, . . . , m} 
represents the lower level variables in the initial random 
population for a particular lower level optimization run. A high 
accuracy is desired particularly at the lower level, therefore 
the value for ai should be kept low. Inaccurate lower level 
solutions may mislead the algorithm in case of a conflict 
between the two levels. 

F. Parameters 

The parameters in the algorithm are fixed as /^t = 3, A = 2 
and r = 2 at both the level. Crossover probability is fixed 
at 0.9 and the mutation probability is 0.1. The upper level 
population size N and the lower level population size n are 
fixed at 50 for all the problems. The values for a^°^ and af'°^ 
are fixed as le — 5 at both the upper and the lower levels. 

VI. SMD PROBLEMS 

A set of test problems (SMD) for single objective bilevel 
optimization has recently been proposed in [23 1 . The proposed 
problems are scalable in terms of number of decision variables. 
In this section, we discuss about the construction of these test 
problems in brief and provide a description of the proposed 
SMD test problems. The construction of the test problem 
involves splitting the upper and lower level functions into three 
components. Each of the components is specialized for induc- 
ing a certain kind of difficulty into the bilevel test problem. 
The functions are chosen based on the difficulties required 
in terms of convergence at the two levels, and interaction 
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between the two level. A generic bilevel test problem is stated 
as follows: 



Fo {Xu , ) = -Fq i^ul ) + {xii ) + ^0 i^u2 , X12) 
/o(x„,Xi) = fo{Xul,Xu2) + foixil) + fo{Xu2,Xl2) 

where 

Xu = ixui,Xu2) and x/ = (x/i, a;;2) 



(8) 



The above equations contain three terms at both the levels. 
Table |l] provides a summary for the role played by each term 
in the equations. The upper level and lower level variables are 
broken into two smaller vectors (refer Panel A in Table [), such 
that, vectors x„i and xn are used to induce complexities at the 
upper and lower levels independently, and vectors Xu2 and X12 
are responsible to induce complexities because of interaction. 
The upper and lower level functions are decomposed such that 
each of the components is specialized for a certain purpose 
only (refer Panel B in Table|I]l. The term at the upper 

level, is responsible for inducing difficulty in convergence 
solely at the upper level, and the term f2{xii), at the lower 
level, is responsible for inducing difficulty in convergence 
solely at the lower level. The term F2{xii) decides if there 
is a conflict or a cooperation between the two levels. The 
terms -Fb (a;i2 , 2^112 ) and fzixn^ 1^2) induce difficulties because 
of interaction at the two levels, though F3 {xi2 , Xu2 ) may 
also be used to induce a cooperation or a conflict. Finally, 
fiixui^Xui) is a fixed term at the lower level which does 
not induce any difficulties, rather helps to create a functional 
dependence between lower level optimal solution(s) and the 
upper level variables. 

A. SMDl 

SMDl is a test problem with cooperation between the two 
levels. The lower level optimization problem is a simple con- 
vex optimization task. The upper level is convex with respect 
to upper level variables and optimal lower level variables. 

/o^ = ELi«i)' 

/o ~ Ei=i(2^M2 ~ tanxjj) 
The range of variables is as follows, 

<iG[-5,10], V i^{\,2,...,p] 

<2e[-5,i0], V ?:e{i,2,...,r} 

x\^ e [-5,10], V iG{l,2,...,q} 
42G(^>f)> V *e{l,2,...,r} 

Relationship between upper level variables and lower level 
optimal variables is given as follows. 



(10) 



-0, V ze {i,2,...,p} 



tan-i<2, V I e {l,2,...,r} 



(11) 



The values of the variables at the optima are = and xi is 
obtained by the relationship given above. Both the upper and 
lower level functions are equal to at the optima. Figure |6] 
shows the contours of the upper level function with respect 
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P: Upper level function contours 
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S: Upper level function contours 
with respect to lower level variables 

at (x^j,x^2) = 



Fig. 6. Upper level function contours for a four variable SMD 1 test problem. 



to the upper and lower level variables for a four variable 
test problem with dim{xui) = dim{xu2) = dim{xii) = 
dim{xi2) = 1. Figure |6] shows the contours of the upper level 
function at each Xu in Sub-figure P assuming that the lower 
level variables are optimal. Sub-figure S shows the behavior 
of the upper level function with respect to xi at optimal Xu- 



B. SMDl 

SMD2 is a test problem with a conflict between the upper 
level and lower level optimization tasks. The lower level opti- 
mization problem is a convex optimization task. An inaccurate 
lower level optimum may lead to upper level function value 
better than the true optimum for the bilevel problem. The 
upper level is convex with respect to upper level variables 
and optimal lower level variables. 



Fl = -Y.UM,Y 

Fq = Ej;=i (2^112)^ ~ 

/t?-ELl«2-l0g^?2)' 



EI=i«2 -loga;;2)^ 



(12) 



The range of variables is as follows. The range of variables is 
as follows. 



<i e [-5,10], V te{i,2,...,p} 

<2G[-5,1], V iG{l,2,...,r} 
xl G [-5,10], V te{l,2,...,q} 



(13) 



X 
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G(0,e], V iG{l,2,...,r} 



Relationship between upper level variables and lower level 
optimal variables is given as follows. 



xl^O, V zG{l,2,...,g} 

-1 



log' 



V I G {l,2,...,r} 



(14) 



The values of the variables at the optima are Xu = and xi 
is obtained by the relationship given above. Both the upper 
and lower level functions are equal to at the optima. Figure 
|7] represents the same information as in Figure |6] for a four 
variable bilevel test problem. 
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TABLE I 

Overview of test-problem framework components 



Panel A: Decomposition of decision variables 



Upper-level variables 


Lower-level variables 


Vector 


Purpose 


Vector 


Purpose 


Xul 
Xu2 


Complexity on upper-level 
Interaction with lower-level 


Xl2 


Complexity on lower-level 
Interaction with upper-level 



Panel B: Decomposition of objective functions 



Upper-level objective function 


Lower-level objective function 


Component 


Purpose 


Component 


Purpose 


F^(Xu2,Xi2) 


Difficulty in convergence 
Conflict / co-operation 
Difficulty in interaction 


/o(x„i,a;„2) 

fSixii) 
foiXu2,Xi2) 


Functional dependence 
Difficulty in convergence 
Difficulty in interaction 




^ S: Upper level function contours 

with respect to lower level variables 

P: Upper level function contours at (x x ) = (0 0) 

ul' u2 ' 

with respect to upper level variables 
at optimal lower level variables 

Fig. 7. Upper level function contours for a four variable SMD2 test problem. 




P: Upper level function contours at (''^^'^^^2^ ~ ^^'^^ 

with respect to upper level variables 
at optimal lower level variables 



Fig. 8. Upper and lower level function contours for a four variable SMD3 
test problem. 



C. SMD3 

SMD3 is a test problem with a cooperation between the two 
levels. The difficulty introduced is in terms of multi-modality 
at the lower level which contains the Rastrigin's function. The 
upper level is convex with respect to upper level variables and 
optimal lower level variables. 

^ x]r=i(^ui)^ 



— cos 2'KX 



(16) 



/,? = E:=i(«2)'-tanxl,)2 
The range of variables is as follows, 

xl, e [-5,10], V ie{i,2,...,p} 

a;L2 € [-5,10], V ie{l,2,...,r} 

xl e [-5,10], V ie{l,2,...,g} 

^l2G(3':f)> V *G{l,2,...,r} 

Relationship between upper level variables and lower level 
optimal variables is given as follows, 

xl,^0, V ie{l,2,...,q} 
zj2=tan-i(<2)', V *G{l,2,...,r} 

The values of the variables at the optima are a;„ = and xi 
is obtained by the relationship given above. Both the upper 
and lower level functions are equal to at the optima. Figure 
|8] shows the contours of the upper level function at each Xu 



in Sub-figure P assuming that the lower level variables are 
optimal. Sub-figure S shows the behavior of the lower level 
function at optimal x^- 

D. SMD4 

SMD4 is a test problem with a conflict between the two 
levels. The difficulty is in terms of multi-modality at the lower 
level which contains the Rastrigin's function. The upper level 
is convex with respect to upper level variables and optimal 
lower level variables. 

= EUi<2? - ELi(l<2l - iog(i + ^h)? 

f^^j::=AK2\-^og{i+xi,)r 

The range of variables is as follows, 

<i e [-5,10], V ^e{l,2,...,ri 



^0^ 



l^ei-i,!], V ^e{l,2,...,r} 
i, e [-5,10], V {1,2,. ..,<?} 



(19) 



x]^ G [0,e], V i G {1,2, ...,r} 

Relationship between upper level variables and lower level 
optimal variables is given as follows, 

x|i=0, V iG{l,2,...,g} 



1, V i G {1,2, ...,r} 



(20) 
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P: Upper level function contours 
with respect to upper level variables 
at optimal lower level variables 



S: Lower level function contours 
with respect to lower level variables 

at (x^j.x^j) = 



Fig. 9. Upper and lower level function contours for a four variable SMD4 
test problem. 



The values of the variables at the optima are Xu = and xi is 
obtained by the relationship given above. Both the upper and 
lower level functions are equal to at the optima. Figure |9] 
represents the same information as in Figure |8] for a four 
variable bilevel problem. 

E. SMD5 

SMD5 is a test problem with a conflict between the two 
levels. The difficulty introduced is in terms of multi-modality 
and convergence at the lower level. The lower level problem 
contains the banana function such that the global optimum 
lies in a long, narrow, flat parabolic valley. The upper level is 
convex with respect to upper level variables and optimal lower 
level variables. 



Wl 



1 



/(} = ELi«i)' ^ ^ 

Po^Y.U{\<2\-{^n?? 
The range of variables is as follows, 

<iG[-5,10], V iG{l,2,...,p} 
xl2G[-5,10], V ie{\,2,...,r} 

x\^ e [-5,10], V ie{i,2,...,q} 

e [-5,10], V ie{l,2,...,r} 



(22) 
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Relationship between upper level variables and lower level 
optimal variables is given as follows. 



X 
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(23) 



1, V ie{l,2,...,q} 
A2 = ^MA. V ze{l,2,...,r} 

The values of the variables at the optima are = and xi is 
obtained by the relationship given above. Both the upper and 
lower level functions are equal to at the optima. 

F. SMD6 

SMD6 is a test problem with a conflict between the two 
levels. The problem contains infinitely many global solutions 
at the lower level, for any given upper level vector Out of the 



entire global solution set, there is only a single lower level 
point which corresponds to the best upper level function value. 



(24) 







^+E?i%i(-ii); ^ 


FS 


— ~ Si=l(^Zl 


F! 


= Si=l(^«2) 




/(} 


= ELi(<i)' 






= ELi(^ii)'- 


^ /^i=q+l.i=i+2\'^ll 









The range of variables is as follows, 

<i e [-5,10], V I e {i,2,...,p} 
<2 e [-5,10], V I e {1,2,. ..,r} 
x^ e [-5,10], V i e {i,2,...,g + s} 

a;j2 e [-5,10], V ie{l,2,...,r} 



(25) 



Relationship between upper level variables and lower level 
optimal variables is given as follows. 



c,\=0, V ie{l,2,...,q} 
'^l2^<2, V *e{l,2,...,r} 



(26) 



The values of the variables at the optima are a;„ = and xi is 
obtained by the relationship given above. Both the upper and 
lower level functions are equal to at the optima. 

VII. Standard test problems 

Table HIl provides a set of 8 standard test problems collected 
from the literature. The dimensions of the upper and lower 
level variables are given in the first column, and the problem 
formulation is defined in the second column. The third column 
provides the best known solution available in the literature for 
the chosen test problems. 

VIII. Results 

In this section, we provide the results obtained on the 
SMD test problems and other test problems using the BLEAQ 
approach. The section is divided into two parts. The first 
part contains the results obtained using the BLEAQ approach, 
which has been compared with the results obtained using a 
nested procedure. It was difficult to identify an approach from 
the existing literature, which can efficiently handle the SMD 
test problems with 10 variables. Therefore, we have chosen the 
nested approach as the benchmark for this study. The second 
part contains the results obtained on 8 standard constrained test 
problems, which have been studied by others in the past. We 
compare our method against the approach proposed by Wang 
et al. (2005), which is able to handle all the test problems 
successfully. 

A. Results for SMD test problems 

We report the results obtained by the nested procedure and 
the BLEAQ approach in this sub-section on 10-dimensional 
unconstrained SMD test problems. For test problems SMDl 
to SMD5 we choose p = 3, 9 = 3 and r = 2, and for SMD6 
we choose p = ?>, q=\, r = 2 and s = 2. Tables |lll] and 
HV] provide the results obtained using the nested approach on 
these test problems. The nested approach uses an evolutionary 
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TABLE II 

Description of the selected standard test problems (TP1-TP4). 


Problem 


ForniulLiliun 


Best Known Sol. 


TPl 


Minimize Fq{xu, = (a;^ - 30)^ + {x^ - 20)^ - 20xj + 20xf , 




A/ = 2 TTi = 2 


y e argmin ( foi^r^^^O = i^i " ^} )^ + i^l " ^f)^ \ , 
(y) 1 0<y, <10, i = l,2 /■ 

^li + ^^u ^ + - a;2 < 15 


Fo = 225.0 
/O = 100.0 


TP2 


Minimize F()(xu, x,) = 2x}, + 2j;2 _ s^-l _ 33.2 _ qq 




A/ = 2 TJT = 2 


s.l. 

1 (i^ 3;2 + 20) I 

y e argmin < ~ > , 
(y) ^11 ~ 2a:^ > 10, xf^ - 2xf > 10 

I -10 > > 20, i = 1, 2 J 
x^ + 3:2 _^ ^1 _ 2a;2 ^ 
< x^ < 5Q, i = 1,2. 


Fq = 0.0 
/q = 100.0 


TP3 


Minimize Fq{xu, x^) = -(a;J-j2 _ 3(x2 )2 „ 4^1 + (a;2)2^ 




A/ = 2 TJT = 2 


s.l. 

r /oC^ti>^i) = 2(xi)2 + (:rl)2 _ 5^2 -j 

■ J - 2a:?^, + (a:?,)2 _ ^ ^2 > .3 1 
y e argmin / ^ u ^ u' L L - S , 
(y) x(^ + 3=c;l - 4x^ > 4 

I < yi, i = 1, 2 J 
(x^ )2 _^ < 4^ 

< x,^ , i = 1, 2 


Fo = -18.6787 
/O = -1.0156 



TP4 

A/ = 2 TTi = 3 


Minimize Fq(x 
s.l. 

y G argmin 

(») 

< Xi , i = 


11, X;) = -8x1 _ 4,2 ^ 4,1 _ 4Q,2 _ 

/o(x„,x;) = xi + 2xJ + xl + xf 
x2 + xp - xl < 1 
2x1 _ ,1 _^ 2x? - 0.5x? < 1 
2x2 ^ 2xj - xp - 0.5x^ < 1 
, < Hi, i = 1, 2 
1, 2 


-4x3, 

+ 2x3 1 


Fo = -29.2 
So = 3-2 


TPS 

A/ = 2 TJT = 2 


Minimize f Q (x^ -x^) = rt(j:u)mu — — +0.5t{x|)xj, 

r /o(xu, X,) = 0.5t(a;i)hx, - t(6(xu))x, l 

-0.333xJ + x2 _ 2 < 1 
H G aremin \ t ^ o ft 
tv) "l ~ 0'333xj' - 2 < I 
[ < Hi, i = 1, 2 J 

wlicre 

"=(3 10 )."(-) = ( li' -3 )-.'■ = 01 

t ( ■ ) denotes transpose of a vector 


Fo = -3-6 
So = -2-0 


TP6 

A^ = 1 TIT = 2 


Minimize Fn(x 
(x„,x,) 
s.t. 

y £ argmin 

(h) 

< xl 


Li, X;) = (xl - 1)2 + 2x1 _ 2,1 ^ 

■ /o(^u. ^1) = (2x,l - 4)2+ 1 
(2x2 _ 4)2 + ,1,1 

4x1 ^ 2,1 4,2 ^ 42 
4x? - 4x1^ - 5x) < -4 
4x1^ - 4xj^ + 5xp < 4 
4x1 _ 4,1 _j_ 5,2 ^ 4 

■ < Hi , ^ = 1, 2 J 


Fo = -1-2091 
/O = 7-6145 


TP7 

A/ = 2 TJT = 2 


Minimize ( x 

(X„,X;) 

y G argmin 

(y) 

(xl)2 + (3,2 
< x^, i = 


(xi+xl)(xj+x2) 
- l + xlxl+x2,2 . 

•'o(-".-i) i+,l',i+,a,j 

. < Hi < Xi, 1 = 1,2 
)2 < 100 
1, 2 




Fo = -1-98 
So = 1-98 


TPS 

A / = 2 TTI = 2 


Minimize Fq(x 
s.t. 

y G argmin 

(h) 

< xi < 50 


11, X;) = |2xl + 2x2 _ 3,1 _ 3,2 _ f 

/0(=:u.="i) = (=•:} 20)2 + 
("^? - + 20)2 
2x1 _ ,1 ^ 40 < 
2x2 - ,J + 10 < 
-10 < Hi < 20, 1 = 1,2 
j - 2x^ < 40 
1 = 1,2 


0, 


Fo = 0-0 

So = 100.0 
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algorithm at both the levels. For every upper level member a 
lower level optimization problem is solved in this approach 
1231 . The approach successfully handles all the test problems. 
However, as can be observed from Table |III] the number of 
function evaluation required are very high at the lower level. 
We use the nested approach as a benchmark to assess the 
savings obtained from the BLEAQ approach. Tables IV] and I VII 
provides the results obtained using the BLEAQ approach. 
Fourth and Fifth column in Table [V] provides the median 
function evaluations required at the upper and lower levels 
respectively. The numbers in the brackets provide a ratio of the 
function evaluations required using nested approach against 
the function evaluations required using BLEAQ. The nested 
approach requires 2 to 5 times more function evaluations at 
the upper level, and more than 10 times function evaluations at 
the lower level as compared to BLEAQ. Both the approaches 
are able to successfully handle all the test problems. How- 
ever, BLEAQ dominates the nested approach by a significant 
margin, particularly in terms of the function evaluations at the 
lower level. 

B. Convergence Study 

In this sub-section we provide the results on the convergence 
of the algorithm towards the bilevel optimal solution. We 
performed a convergence study for the first two SMD test 
problems with 10-dimensions, and the results are presented in 
Figures [TO] and [TT] for one particular run. The figures show 
the progress of the elite member at each of the generations. 
The upper plot shows the convergence towards the upper 
level optimal function value, and the lower plot shows the 
convergence towards the lower level optimal function value. 
The algorithm preserves the elite member at the upper level, 
so we observe a continuous improvement in the upper plot. 
However, in a bilevel problem an improvement in the elite at 
the upper level does not guarantee a continuous improvement 
in the lower level function value. Therefore, we observe that 
the lower level convergence plot is not a continuously reducing 
plot rather contains small humps. 

Next, we evaluate the algorithm's performance in approxi- 
mating the actual ^ mapping. For this we choose the problem 
SMDl, which has the following mapping. 

=0, V {l,2,...,p} 
a;j2 = tan-i <2, V i G {1, 2, . . . , r} ^ ' 

We set p = 3, g = 3 and r = 2 to get a 10-dimensional SMDl 
test problem. The ^I* mapping for SMDl test problem is vari- 
able separable. Therefore, in order to show the convergence 
of the approximation on a 2-d plot, we choose the variable 
xj^, and show its approximation with respect to x^^g- The 
true relationship between the variables is shown in Figure [12] 
along with the quadratic approximations generated at various 
generations. It can be observed that the approximations in the 
vicinity of the true bilevel optimum improves with increasing 
number of generations. 

C. Results for constrained test problems 

In this sub-section, we provide results for 8 standard con- 
strained test problems chosen from the literature. We compare 
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our results against the approach proposed by Wang et al. 
(2005). The reason for the choice of this approach as a 
benchmark is that the approach was successful in solving all 
the chosen constrained test problems. The results obtained 
using the BLEAQ and WJL 1271 approach has been provided 
in Tables IVIII IVIIII and |IXl Table IVIII provides the minimum, 
median and maximum function evaluations required to solve 
the chosen problems using the BLEAQ approach. Table IVIIII 
provides the accuracy obtained at both the levels, in terms 
of absolute difference from the best known solution to a 
particular test problem. The numbers in the brackets provide 
the best solution known from the literature for each of the test 
problems. The table also provides the median number of lower 
level calls, and the average number of lower level function 
evaluations required per lower level call. Table IIX I compares 
the mean function evaluations at the upper and lower levels 
required by BLEAQ against that required by WJL. In their 
paper, Wang et al. have reported the function evaluations as 
sum of function evaluations for the two levels. The reported 
metric in their paper is the average function evaluations from 
multiple runs of their algorithm. We have computed a similar 
metric for BLEAQ, and the results are reported in the table. 
It can be observed that WJl requires close to an order of 
magnitude times more function evaluations for most of the 
test problems. This clearly demonstrates the efficiency gain 
obtained using the BLEAQ approach. It also suggests that 
the mathematical insights used along with the evolutionary 
principles in the BLEAQ approach is helpful in converging 
quickly towards the bilevel optimal solution. 

IX. Conclusions 

In this paper, we have propose an efficient bilevel evolu- 
tionary algorithm (BLEAQ) which works by approximating 
the optimal solution mapping between the lower level optimal 
solutions and the upper level variables. The algorithm not 
only converges towards the the optimal solution of the bilevel 
optimization problem, but also provides the optimal solution 
mapping at the optima. This provides valuable information, 
as to how the lower level optimal solution changes based on 



14 



TABLE III 

Function evaluations (FE) for the upper level (UL) and the lower level (LL) from 1 1 runs of nested approach. 



Pr. No. 


Best 


Median 


Worst 


Total LL 
FE 


Total UL 
FE 


Total LL 
FE 


Total UL 
FE 


Total LL 
FE 


Total UL 
FE 


SMDl 


834478 


1195 


1628136 


2473 


2037784 


3258 


SMD2 


1001588 


1495 


1500725 


2239 


2095710 


3341 


SMD3 


868658 


1232 


1412300 


2231 


1790398 


2684 


SMD4 


550263 


730 


1106389 


1634 


1337435 


1988 


SMD5 


1257659 


1666 


1947316 


2875 


2561079 


3528 


SMD6 


1243533 


1624 


2326072 


3076 


2946779 


3812 



TABLE IV 

Accuracy for the upper and lower levels, and the lower level calls from 1 1 runs of nested approach. 



Pr. No. 


Median 


Median 


Median 


LL Evals 
LI, Calls 


UL Accuracy 


LL Accuracy 


LL Calls 


SMDl 


0.0051 


0.0016 


2473 


647.02 


SMD2 


0.0015 


0.0005 


2239 


653.82 


SMD3 


0.0087 


0.0026 


2231 


649.59 


SMD4 


0.0078 


0.0027 


1634 


694.91 


SMD5 


0.0012 


0.0032 


2875 


717.13 


SMD6 


0.0096 


0.0071 


3076 


767.03 



TABLE V 

Function evaluations (FE) for the upper level (UL) and the lower level (LL) from 1 1 runs of the proposed BLEAQ. 



Pr. No. 


Best Func. Evals. 


Median Func. Evals. 


Worst Func. Evals. 


LL 


UL 


LL 

(Savings) 


UL 

(Savings) 


LL 


UL 


SDMI 


99315 


610 


110716 (14.71) 


740 (3.34) 


170808 


1490 


SDM2 


70032 


376 


91023 (16.49) 


614 (3.65) 


125851 


1182 


SDM3 


110701 


620 


125546 (11.25) 


900 (2.48) 


137128 


1094 


SDM4 


61326 


410 


81434 (13.59) 


720 (2.27) 


101438 


1050 


SDM5 


102868 


330 


126371 (15.41) 


632 (4.55) 


168401 


1050 


SDM6 


5558 


658 


8393 (277.14) 


946 (3.25) 


10017 


1322 



changes in the upper level variable vector in the vicinity of 
the bilevel optima. The algorithm has been tested on two sets 
of test problems. The first set of test problems are recently 
proposed SMD test problems which are scalable in terms 
of number of variables. The method has been tested on a 
10-dimensional instance of these test problems. The second 
set of test problems are 8 standard test problems chosen 
from the literature. These problems are constrained and have 
relatively smaller number of variables. The results obtained 
using the BLEAQ approach has been compared against a 
nested strategy for the SMD test problems, and against the 
WJL approach for the standard constrained test problems. For 
both the sets, BLEAQ offers significant improvement in the 
number of function evaluations at both the levels. 

X. Acknowledgements 

Ankur Sinha and Pekka Malo would like to acknowledge the 
support provided by Liikesivistysrahasto and Helsinki School 
of Economics foundation. Authors of the paper would like to 
acknowledge the conversations with Prof. Yu-Ping Wang. 



References 

[1] E. Aiyoshi and K. Shimizu. Hierarchical decentralized systems and its 

new solution by a barrier method. IEEE Transactions on Systems, Man, 

and Cybernetics, 11:444^49, 1981. 
[2] J.F. Bald and J. Falk. An explicit solution to the multi-level programming 

problem. Computers and Operations Research, 9:77-100, 1982. 
[3] L. Bianco, M. Caramia, and S. Giordani. A bilevel flow model for 

hazmat transportation network design. Transportation Research Part C: 

Emerging technologies, 17(2): 175-196, 2009. 
[4] B. Colson, P. Marcotte, and G. Savard. An overview of bilevel 

optimization. Annals of Operational Research, 153:235-256, 2007. 
[5] K. Deb. An efficient constraint handling method for genetic algorithms. 

Computer Methods in Applied Mechanics and Engineering, 186(2- 

4):31 1-338, 2000. 

[6] K. Deb and A. Sinha. An efficient and accurate solution methodol- 
ogy for bilevel multi-objective programming problems using a hybrid 
evolutionary-local-search algorithm. Evolutionary Computation Journal, 
18(3):403^49, 2010. 

[7] S. Dempe. Annotated bibliography on bilevel programming and mathe- 
matical programs with equilibrium constraints. Optimization, 52(3):339- 
359, 2003. 

[8] S. Dempe, J. Dutta, and S. Lohse. Optimality conditions for bilevel 
programming problems. Optimization, 55(56):505-524, 2006. 

[9] S. Dempe and S. Vogel. The subdifferential of the optimal solution in 
parametric optimization. Technical report, Fakultat fur Mathematik und 
Informatik, TU Bergakademie, 1997. 



15 



TABLE VI 

Accuracy for the upper and lower levels, and the lower level calls from 1 1 runs of the proposed BLEAQ. 



Pr. No. 


Median 


Median 


Median 


LL Evals 
1,1, Calls 
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Fig. 10. Convergence plot for SMDl. 
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Pr. No. 
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UL 


LL 


UL 


LL 


UL 
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356 


TP7 
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3224 


267784 


4040 


296011 


5042 


TPS 


10796 


1288 


12300 


1446 


18086 


2080 
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Accuracy for the upper and lower levels, and the lower level calls from 1 1 runs of the proposed BLEAQ. 



Pr. 


Median 


Median 


Median 


LL Evals 
LL Calls 


UL Accuracy 


LL Accuracy 


LL Calls 
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TP2 


0.012657 (0.0) 


0.000126 (100.0) 
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TP3 


0.000000 (-18.678711) 


0.000000 (-1.015625) 


112 


40.00 


TP4 


0.040089 (-29.2) 


0.007759 (3.2) 


255 


60.00 


TP5 


0.008053 (-3.6) 


0.040063 (-2.0) 


203 


78.50 


TP6 


0.000099 (-1.2091) 


0.000332 (7.6145) 


233 


75.23 


TP7 


0.093192 (-1.98) 


0.093192 (1.98) 


3862 


95.42 


TPS 


0.001819 (0.0) 


0.000064 (100.0) 


200 


61.50 



16 



TABLE IX 

Comparison of BLEAQ against the results achieved by WJL approach. 



Pr. No. 


Mean LL Func. Evals. 


BLEAQ 


WJL 


Savings 
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5.27 


TP2 


17210 
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14.89 


TP3 


5256 


92526 


17.60 


TP4 


16690 


291817 


17.48 


TP5 


16738 


77302 


4.62 


TP6 
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9.24 


TP7 
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4.11 


TPS 


14105 


213522 


15.14 
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